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We study ensemble-based graph-theoretical methods aiming to approximate the size of the minimum 
dominating set (MDS) in scale-free networks. We analyze both analytical upper bounds of dominating sets 
and numerical realizations for applications. We propose two novel probabilistic dominating set selection 
strategies that are applicable to heterogeneous networks. One of them obtains the smallest probabilistic 
dominating set and also outperforms the deterministic degree-ranked method. We show that a 
degree-dependent probabilistic selection method becomes optimal in its deterministic limit. In addition, we 
also find the precise limit where selecting high-degree nodes exclusively becomes inefficient for network 
domination. We validate our results on several real-world networks, and provide highly accurate analytical 
estimates for our methods. 

I t is a critical task in network science and its applications to find methods to efficiently detect, monitor and 
I control the behavior of nodes in networks. Finding small dominating sets on static or slowly evolving networks 
I is an effective approach in achieving these objectives. A dominating set of a network G with node set V is a 
subset of nodes SC V, such that every node not in S is adjacent to at least one node in S, while the minimum 
dominating set (MDS) is the smallest cardinality dominating set. Dominating sets provide key solutions to 
various critical problems in networked systems, such as network controllability 14 , observability of the power- 
grid 5 , social influence propagation 6,7 , optimal sensor placement for disease outbreak detection 8 , distributed 
allocation of network resources 9 , and finding high-impact optimized subsets in protein interaction networks 10 . 
The effective use of dominating sets in these problems demands profound understanding of the behavior of 
dominating sets with respect to various network features, as well as developing effective methods for finding 
different types of dominating sets that are optimal solutions for different problems. 

In most applications that utilize dominating sets, the main goal is to minimize the number of selected 
dominator nodes, because implementing dominators usually incurs some form of per-node cost. However, 
finding the MDS of a network is a well-known NP-hard problem in graph theory. It was proven that finding a 
sublogarithmic approximation for the size of MDS is also NP-hard, but a logarithmic approximation can be 
efficiently found by a simple greedy search algorithm 1113 . While research is focused on finding better approx- 
imations to the MDS 1415 and minimum connected dominating sets 16 20 (applicable to wireless communication 
and sensor networks), and developing exponential algorithms to find the exact MDS 2124 , it remains a fun- 
damental challenge to develop cost-efficient strategies for selecting dominators in a network. 

In this work, we consider the additional factor of local connectivity information availability that affects the cost 
of finding dominating sets. Efficient dominating set search algorithms require full knowledge of network struc- 
ture and connectivity patterns (i.e., adjacency matrix, or equivalent adjacency information). Obtaining this 
information in large networks (over tens of millions of nodes) involves additional expenses that can ultimately 
lead to overall suboptimal costs. In addition, sophisticated search methods tend to have polynomial computa- 
tional time complexity with high orders in the number of nodes or edges, therefore their applicability to large real 
networks is questionable. Our present study is aimed towards designing dominating set selection strategies that 
satisfy the cost-efficiency demands in terms of required connectivity information, computational complexity, and 
the size of the resulting dominating set. We develop these methods for selecting dominators in heterogeneous 
networks, particularly in scale-free networks, described by a power-law degree distribution [P(k) ~ k~ y ]. 
Networks with this fundamental property appear in numerous real-world systems, including social, biological, 
infrastructural and communication networks. Here we show that the degree-dependent probabilistic selection 
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method becomes optimal in its deterministic limit. In addition, we 
also find the precise limit where selecting high-degree nodes exclu- 
sively becomes inefficient for network domination. 

Literature provides detailed analysis on the bounds of dominating 
sets in various types of networks 25 with respect to structural prop- 
erties. Cooper et al. 26 analyzed the behavior of MDS in model scale- 
free networks created by preferential attachment rule 27 that generates 
networks with power-law exponent of y = 3. They found that the 
MDS size is bounded above and below by functions linear in N, where 
N denotes the number of nodes in the network. Similar research has 
been conducted on random regular graphs and Erdos-Renyi (ER) 2S 
graphs. Zito 29 studied the size of the minimum independent dom- 
inating set on r-regular random graphs with 3 £ r £ 7 and demon- 
strated that the size of this set and consequently the size of the MDS is 
upper bounded by a linear function of N. Later, Biro et al. 30 improved 
the prefactor of the 0(N) bound of the size of MDS in r-regular 
graphs using a greedy algorithm 11,12,3132 . In addition, Wieland 
et al. 33 derived general bounds for dense ER graphs using fixed edge 
probability and demonstrated that the MDS size scales as log N. 
However, this result cannot be applied to sparse graphs with fixed 
average degrees. 

Recent studies 34,35 analyzed the scaling behavior of MDS in scale- 
free networks with a wide range of network sizes and degree expo- 
nents. It was found that the MDS size decreases as y is lowered, and in 
certain special cases when the network structure allows the presence 
of 0(N) degree hubs (when y < 2), the MDS size shows a transition 
from linear to 0( 1 ) scaling with respect to network size, making these 
heterogeneous networks very easy to control. However, the impact of 
network assortativity, which is a fundamental property in real net- 
works, has not been studied. 

In complex networked systems, mixing patterns are usually 
described by assortativity measures. A network is considered assort - 
ative if its nodes tend to connect to other nodes which have similar 
number of connections, while in a disassortative network the high 
degree nodes are adjacent to low degree nodes. Investigating the 
behavior of dominating sets with respect to assortativity is essential 
for deeper understanding of the network domination problem. 
Several studies conducted on real- world networks have shown that 
social systems are assortative, while technological ones exhibit dis- 
assortative behavior 41 . Social psychology studies have shown that 
humans are more likely to establish a connection with individuals 
from the same social class, or with whom they share common inter- 
ests, such as education or workplace. This tendency, named homo- 
phily, also governs the attachment rules in real-life social systems, 
and it is reflected in the mixing patterns of these networks, which are 
of significant importance in dynamical processes on social networks. 
Specific connectivity schemes affect influence propagation and epi- 
demic spread 37,38 , and are also responsible for Web page ranking 39 
and internet protocol performance 40 . 

Newman proposed a method to quantify assortativity in networks 
using a Pearson correlation between degrees at the end of edges 41,42 , 
which he defined as the assortativity coefficient. However, a recent 
study of Litvak and van der Hofstad 43 has shown that this coefficient 
has limited applicability (only for finite variances) and is also 
dependent on network size. In order to resolve these biases, they 
proposed a new approach to measure assortativity based on 
Spearman's p 44 , which is a Pearson correlation coefficient between 
ranked variables. This method provides consistent assortativity 
values, irrespective of network size, thus allowing assortativity com- 
parison between various network sizes. In addition, it can also reveal 
strong dependencies more efficiently in large networks. Therefore, 
we also use Spearman's p as the assortativity measure in our work. 

Here, we also develop and employ a new method to efficiently 
control assortativity in network ensembles. Using this technique, 
our goal is to provide a large-scale analysis on the behavior of various 
dominating sets, with respect to a wide range of network parameters, 



including assortativity. Finally, we also compare our findings on 
model scale-free networks and real-world network samples. 

Results 

We start our study by considering potential directions on how to 
build dominating sets in a network without full adjacency informa- 
tion. We must select nodes based solely on their individual prop- 
erties, such as the node degree, and potentially a limited amount of 
global network information, such as the number of nodes, average 
degree, and power-law degree exponent. We construct our probabil- 
istic methods (and their deterministic limit) based on this 
information. 

Probabilistic Dominating Sets. The results of Alon and Spencer 12 
provide a graph-theoretical approach to finding an upper bound for 
the size of the minimum dominating set, and as part of it they 
propose a probabilistic method for selecting dominator nodes. 
While their approach is theoretical, we can carry out their method, 
numerically, to obtain a probabilistic dominating set, and study its 
properties in scale-free networks. 

Finding a probabilistic (random) dominating set (RDS) in a graph 
has the following steps. First, we visit each node, and add it to an 
initially empty set X, with probability p (a parameter chosen arbit- 
rarily,p 6 [0, 1]), independently of othernodes. Then, the remaining 
nodes that are not in X nor adjacent to any node in X are placed in set 
Y. The dominating set is obtained by X U Y. Alon and Spencer 
showed 12 that the expected size of this set is 

\RDS\ = \X\ + \Y\<Np + N{l-p) k ^ + 1 , (1) 

where k mjn is the minimum degree and N is the number of nodes in 
the graph. By differentiation of this formula with respect to p we can 
find the optimal p value that minimizes \RDS\ (the corresponding 
dominating set is denoted by oRDS; o stands for optimal), which is 
then further bounded from above: 

\MDS\ <\oRDS\ <N[l-k min {l + k mln )- l - 1/km '^ . (2) 

Our numerical results on scale-free network samples in comparison 
with the analytical values is shown in Fig. 1 for one set of network 
parameters, while plots over a wide range of parameters (2 S y < 4; 
4 < (i) < 16) are included in the Supplementary Information, 
Figures SI and S2. We find that our numerically obtained RDS size 
is substantially lower than the analytical one, for most studied com- 
bination of network parameters. However, whenp > 0.5 the size of 
the RDS found numerically closely approaches the analytical curve, 
in every case. 

We also compare the size of RDS to other dominating sets. We find 
that the sequential greedy method that approximates the MDS (and 
uses adjacency information) is still far more efficient than RDS. 
However, we find that the simple degree-ranked dominating set 
(DDS) is outperformed by RDS for optimally chosen p values. 

Degree-Dependent Random Dominating Sets. To improve the 
results of RDS we have to consider that complex networks are hetero- 
geneous, and it would be beneficial to exploit this characteristic in the 
probabilistic node selection strategy. We propose a novel degree- 
dependent probability function for selecting nodes that are placed 
in set X: 

Pi = ™{ hP {lt))' (3) 

where fc, is the degree of node (, fc max is the maximum degree in the 
network, and^) and ft are parameters. Note that we no longer require 
p to be a probability but rather a prefactor that can have any positive 
value. Similarly to the case of degree-independent selection 
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Figure 1 | Analytical [black dashed curve, Eq. (1)] and experimental 
[black solid curve] dominating set sizes as a function of node selection 
probability in random dominating sets (RDS). The analytic optimal upper 
bound I oRDS\ [Eq. (2)] is indicated by the horizontal black dashed line. 
The size of MDS, DDS, and the analytical estimate of the minimum of RDS 
is also presented for comparison. Results are averaged over 200 realizations 
of scale-free networks with no structural cutoff (CONF); N = 5000, (fc) = 
14, y = 2.5, with 20 dominating set searches, averaged for every network 
sample. 

probability, set Y contains nodes that are not dominated by X, and 
the ultimate result, RDS is obtained by X U Y. The main feature of the 
new node selection probability is that nodes with higher degrees are 
more likely selected, which is generally desired to lower the total 
number of dominators. Note, that when p > 1, we can have 
pi = 1, in which case node i is surely selected. 

Figure 2 compares RDS with degree-dependent and degree-inde- 
pendent node selections for a wide range of ji values (note, /? = 0 is 
identical to the degree-independent case). In agreement with our 
expectations, our results clearly show that degree-dependent node 
selection provides a much smaller dominating set than the simple 
degree-independent selection, and thus it also outperforms the 
degree-ranked selection. We can also observe that as the /? parameter 
is increased the smallest possible RDS size decreases, and it 
approaches the greedily approximated MDS size. Notice however, 
that for finding the the smallest possible RDS the value of p has to 
increase as well. 

Cutoff Dominating Sets (CDS). Since the smallest RDS size 
obtained seems to become lower for ever increasing /i values, we 
expect to find the minimum with fS — » °°. Notice, that in this case 
all nodes with degree fc, > k mzx p^ v,! are selected with probability 1 
and nodes with smaller degrees are selected with probability 0. Thus, 
we have a degree threshold, k = k m3x p~ llfl that now deterministically 
decides whether nodes will be added to set X or not. Notice, that we 
can use this k to reparametrize the node selection probability in RDS 
as well: Eq. (3) now becomes 



(4) 



p, = min< 1, 



This form shows even more explicitly that the /? ^> °° case transforms 
the probabilistic selection into a deterministic one based on the /c 
degree cutoff. Therefore, we call the final result a cutoff dominating 
set (CDS). 

Figure 3 shows CDS in comparison with RDS for various /? values; 
similar plots for a wide range of parameters (2 < y < 4; 4 -= (k) £ 16) 
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Figure 2 | Size of random dominating sets (RDS) as a function of p 
prefactor in the degree-dependent node selection probability [Eq. (3)]. 
Data is averaged over 200 samples of scale-free networks with no structural 
cutoff (CONF), and 20 repetitions of dominating set searches for each 
sample. Network parameters: N = 5000, (k) = 14 and y = 2.5. 

are included in the Supplementary Information, Figures S3 and S4. 
We can see that CDS indeed provides the smallest dominating set size 
among probabilistic methods, and when k is optimal (i.e., it mini- 
mizes the size of CDS) the size of CDS almost reaches the greedy 
MDS approximation. Considering how much simpler CDS is com- 
pared to the greedy approximation, this result is quite remarkable. 

In order to further validate the performance of CDS, we calculate it 
on several real-world network samples and compare it to RDS, as well 
as greedy MDS approximation and DDS. We use three collaboration 
networks from the Stanford large network dataset collection 4 ', 
namely the scientific co-authorship networks of Astro Physics (ca- 
AstroPh), Condense Matters (ca-CondMat) and High Energy 
Physics (ca-HepPh). Figure 4 shows these results. In all cases, we 
see the same behavior of CDS as in synthetic networks: CDS reaches 
the smallest possible size of all probabilistic dominating sets, and in 
some cases, it can get very close to the greedy MDS approximation. 

Analytical Estimates of RDS and CDS. Since both RDS and CDS 
require only the degree of each node to decide whether to place that 
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Figure 3 | Cutoff dominating set (CDS) as a function of *- degree cutoff 
parameter in the degree-dependent node selection probability [Eq. (4)]. 

For comparison, curves of RDS are plotted for various /} values. CDS 
corresponds to /f = °°. Results are averaged over 200 realizations of scale- 
free networks with no structural cutoff (CONF); N = 5000, {k) = 14, y = 
2.5, with 20 dominating set searches, averaged for every network sample. 
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node in the X set, we can estimate the size of RDS and CDS in the 
infinite network size limit using continuous degree distributions. In 
general, we can estimate the size of any probabilistic dominating set 
in a network with any degree distribution and degree correlations as 
follows: 



(DS) 
N 



X(k)P(k)dk + 



(l-X(k))P(k) 



(l-X(k'))P(k'\k)dk' 



(5) 



dk. 



where P(k) is the degree distribution on the domain of [fc min , fc max ], 
X(k) is the probability of selecting a node with degree k into set X, 



P(k'\k) is the degree distribution of the neighbors of a node with 
degree k. The first integral calculates the expectation of \X\/N, 
while the rest is the expectation of |Yj/N. The latter is obtained by 
counting the nodes that are not in X (the first part), but only those 
that also have no neighbors in X (the expression in square brackets). 

We can plug in the properly normalized power-law degree distri- 
bution inP(fc). Further, for uncorrelated networks wehaveP(fc'|A:) = 
k' P(k')/(k). For RDS with uniform node selection probability we 
have X(k) = p, resulting in: 

(1-P)(l- 



{RDS) 
N 



- [k&>B,(-Kto log(l-p))-f Ey(,-kBK log(l -p))] (6) 



For RDS with degree-dependent probability we have X(k) 
(k/icY), resulting: 



min(l, 
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Figure 4 | Sizes of probabilistic dominating sets (RDS) in real networks. Subfigures (a), (c) and (e) show the RDS size as a function of pprefactor of node 
selection probability [Eq. (3)], while subfigures (b), (d) and (f) show the same as function of k degree cutoff [Eq. (4)]. For comparison, the degree- 
independent probabilistic (/? = 0.00), degree-ranked and greedy dominating set sizes are also plotted. Network parameters: ca-HepTh: JV= 8638, (fc) = 6, 
y = 2.2; ca-CondMat: N = 21363, (it) = 8, y = 2.7; ca-HepPh: N = 11204, (jfc) = 21, y = 1.7. 
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Figure 5 | Comparison of analytical estimates [Eqs. (6), (7), and (12)] and numerically computed sizes of RDS and CDS in uncorrected (cCONF) scale- 
free networks. For numerical results, data is averaged over 200 network samples. Parameters: N = 5000 and (k) = 16. 
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Finally, for CDS we have X(k) = &(k — ;c), where 0 is the 
Heaviside step function that returns 1 for positive arguments and 0 
otherwise, yielding: 



(CDS) _ k' m J - k 1 + ( 1 - y) [k^K.( - fc min log b) - k 1 -'■£,.(- re log b)] 



N 
with 



b = 



/c min_ 

,2-y 



(12) 



(13) 



Note, that in all the above formulas, E„(z) denotes the exponential 



integral function, E„(z) 



e zt t "dt. The detailed derivation of 



the analytical estimates can be found in Supplementary Information, 
Section S.3. 

Figure 5 shows the accuracy of our analytical estimates in com- 
parison with the numerical results of RDS and CDS. Further results 
on scale-free networks with different (k) and y values are provided in 
the Supplementary Information, Section S.3.4, showing that as the 
(k) increases, the accuracy of the analytical estimates improves. For 
CDS and degree-independent RDS the estimates are very close to the 
numerically obtained values, even with a small (k). The estimates for 
degree-dependent RDS are slightly less accurate, but still sufficient to 
provide a useful approximation of the expected dominating set size. 
Therefore, we can easily calculate a very accurate expected size of 
these dominating sets in uncorrelated scale-free networks, based on 
nothing beyond basic network parameters. 

Effects of network assortativity. Using our edge-mixing method to 
control the assortativity of a network (see Methods and Fig. 6), we 
have compared the sizes of dominating sets as a function of 
assortativity, measured by Spearman's p. Figure 7 shows our 
results for a synthetic network and a real social network, while the 
same comparison for different network parameters is provided in the 
Supplementary Information, Section S.5 for artificial networks, and 
Section S.6 for real networks. 

As expected, the size of most dominating sets increase with higher 
assortativity, except for RDS with degree-independent selection prob- 
ability. The most dramatic size increase is observed in DDS, which 
indicates that this method can only be considered viable in real- world 
applications for highly disassortative networks. Also, as the assorta- 
tivity increases, CDS becomes larger than the simple RDS at a certain 
point, indicating that favoring high-degree nodes as dominators is 
not an effective strategy when the network is highly assortative. While 
the MDS size obtained by greedy search also increases with increasing 
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Figure 6 | Relationship between the assortativity control parameter a and the achieved Spearman's p values in scale-free networks with (a) no structural 
cutoff (CONF), (b) structural cutoff (cCONF). Parameters: N = 5000, y = 2.5, (k) =14. Data is averaged over 100 network samples. Error bars indicate 
the sample standard deviation. 



assortativity, it shows the smallest increase, thus the advantage of 
greedy search over other methods is more pronounced. 

We also analyze the effects of assortativity on the optimal k degree 
threshold value that minimizes the size of CDS. Figure 8 provides a 
complete dependence map of the optimal k with respect to two vital 
network parameters: power-law degree exponent y, and assortativity, 
measured by Spearman's p. Regardless of y and p, we can see that k is 
roughly proportional to the network's average degree. Also, we 
observe that for any particular network assortativity (and p value), 
k ~ e~ y . However, it is intriguing that for a fixed y value, k has a 
maximum approximately at p = —0.4. 

Discussion 

Our first results revealed that the numerically computed size of RDS 
with uniform node selection probability is much smaller than the 
upper bound provided by Alon and Spencer 12 . Since their bound 
assumes that all nodes not dominated by the X set are, in the worst 
case, nodes of the smallest degree, the difference between their bound 
and our result shows the relative number of nodes that have higher 
degree neighbors, yet not dominated. In scale-free networks, we 
indeed expect to find a significant number of lowest degree nodes 
with high-degree neighbors (especially in disassortative networks), 
explaining our observations. 

It is also remarkable that RDS with optimally chosen p parameter 
can always provide a smaller dominating set than a simple degree- 
ranked node selection. While the latter may be favored for its sim- 
plicity and plausibility to be effective in heterogeneous networks, our 



results show that it is not the case; the usefulness of degree-ranked 
dominating sets beyond theoretical studies is very limited. 

The cutoff dominating set (CDS), proposed as a limiting case of 
RDS with degree-dependent node selection probability, is proven to 
be a very effective dominating set selection method. Given full net- 
work information, a sequential implementation of the algorithm 
finds CDS for all possible k degree threshold values in 0(E) time. 
However, since the algorithm only uses local connectivity informa- 
tion, a distributed version can be easily designed, for large networks. 
Further, the value of optimal k (that minimizes the CDS size) has 
little dependence on particular network parameters, as shown in 
Fig. 8, thus it can be estimated easily if detailed network information 
is not available. Based on our extensive numerical simulations, we 
conjecture that using the optimal k the CDS size is the smallest of all 
degree-dependent RDS (with any p 1 ), and it approaches the MDS size 
provided by the greedy algorithm, irrespective of the network's dis- 
tinct topological properties. This conjecture is further validated by 
Fig. 4 that presents results on several real-world network samples. 

We can also understand CDS as a method that bridges the degree- 
ranked and greedy dominator selection methods. When selecting the 
very first nodes of the dominating sets, both greedy and degree- 
ranked methods start by selecting the highest degree nodes. Later, 
they diverge; the degree-ranked selection continues with the high- 
degree nodes, while greedy specifically seeks out nodes that increase 
domination maximally, typically smaller degree nodes. The degree- 
ranked selection eventually becomes very inefficient only because of 
the presence of low degree nodes connected only to each other (and 
hard to reach). Thus, degree-ranked selection is efficient at first, but 




Figure 7 | Dominating sets as a function of Spearman's p assortativity measure in (a) a synthetic network and (b) a real- world network (Gnutella). 

Networks with assortativity values different from the original network are obtained by guided edge-mixing with double-edge swaps. 
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there is a point at which the method should abandon such selection 
and instead look for nodes that are still not dominated, and target 
them specifically. This is exactly what CDS does: it is essentially a 
degree-ranked selection until k is reached (set X), and then the 
remaining undominated part is simply added as dominators (set Y). 

While the analytical estimates for RDS and CDS are highly accur- 
ate, they are only applicable to uncorrelated scale-free networks. 
However, the base formula [Eq. (5)] can be used for any network 
(not only scale-free), if the degree distribution and degree correla- 
tions can be expressed (or approximated) by some formula. Without 
analytical expressions, one can still calculate the base formula 
numerically, using observed (sampled) estimates of the degree dis- 
tribution and degree correlations, assuming that collecting these 
esimates requires less time than actually running the RDS or CDS 
algorithms, or if full adjacency information is not available. 

The accuracy of our analytical estimates for RDS and CDS seem to 
be lower for low (k) and y values. This inaccuracy is an artifact of our 
average degree control method, which controls (k) by adjusting k min , 
and removing a certain fraction of smallest degree nodes. The latter 
becomes significant when A: min — » 1 (for low (fc)), because it causes a 
slight deviation from a perfect power-law degree distribution. In 
order to use the analytical formulas (which are very sensitive to fc min ), 
we have to estimate a fractional /c min , as if it were a cutoff of a 
continuous and otherwise perfectly satisfied power-law distribution. 
In reality, we deviate from power-law, leading to the inaccurate esti- 
mates. However, as (k) increases, /c min also increases, and the relative 
deviation from a perfect power-law decreases, hence the increased 
accuracy. The implication for real networks is that we can expect 
similarly less accurate estimates if the degree distribution deviates 
from power-law. 

Our numerical study of dominating set sizes with respect to assor- 
tativity reveals a general tendency that the dominating set becomes 
larger as assortativity increases. We can understand this easily. In 
case of a disassortative network, high degree nodes connect mostly to 
low degree nodes, therefore we can expect small dominating sets, due 
to efficient domination via high-degree nodes. In fact, when y < 2 



scale-free networks may become so disassortative that star subgraphs 
form and the size of MDS becomes 0(1) 34 . On the other hand, hubs 
are less effective in dominating assortative networks, since most of 
their connections are used to connect to other high degree nodes. 
Therefore, the impact of assortativity on each dominating set selec- 
tion method depends on how much the method relies on high-degree 
nodes as dominators. This is why the degree-ranked selection shows 
the worst performance on highly assortative networks, followed by 
the degree-dependent RDS (and its limiting case, the CDS), which 
also favors high-degree nodes. Since technological scale-free net- 
works tend to be disassortative, and although social networks tend 
to be assortative, extreme assortativity is rare, we can safely conclude 
that CDS is a viable alternative of greedy selection for most scale-free 
networks. 

In summary, we explored probabilistic dominating set selection 
strategies in scale-free networks with respect to various network 
properties. We found that as a particular limiting case of degree- 
dependent random node selection, a deterministic cutoff dominating 
set (CDS) provides the smallest dominating set among probabilistic 
methods, and is widely applicable to heterogeneous networks. Even if 
full adjacency information is not available, the size of CDS (and RDS) 
can be accurately predicted using our analytical estimates. 

Methods 

We construct our ensembles of synthetic scale-free networks (undirected and 
unweighted) using the configuration model 45 46 . First, we generate a power-law degree 
distribution with the desired power-law exponent and average degree. The latter is 
controlled by adjusting the minimum degree cutoff of the distribution, while we 
always keep the maximum degree cutoff fc max fixed: either k max — N — 1 (the max- 
imum possible in any network, hence essentially unrestricted) or k max — \/N (struc- 
tural cutoff, making the network uncorrelated 47,48 ). We obtain a degree sequence from 
the degree distribution by inverse transform sampling. Given the degree sequence, the 
configuration model assigns the corresponding number of half-edges (stubs) to each 
node, and connects randomly (uniformly) chosen pairs of stubs to form links between 
nodes. This procedure is repeated until there are no free stubs left. The result is a 
multigraph; however, we convert multiple links to single links and remove self-loops 
to obtain a simple graph. Due to this pruning of multiple links we have some loss of 
edges, but since we generate networks with y > 2, this loss is negligible. We have used 
the same network construction method in our previous work 34 (including the method 



SCIENTIFIC REPORTS | 4:6308 | DOI: 10.1038/srep06308 



7 



of controlling the average degree by selecting the proper k min value from a precom- 
piled lookup table); according to our previous notation we have here CONF and 
cCONF networks. 

We use two types of dominating sets for comparison with probabilistic dominating 
set selection methods. The first one is an approximation of the MDS, found by a 
sequential greedy algorithm. This method selects nodes one by one, at each step 
selecting a node that provides the maximal increase in the number of dominated 
nodes in the network (with random tie-breaking); this is the same method as used in 34 . 
The second method is the degree-ranked dominating set selection (DDS), where we 
build the dominating set by selecting nodes in decreasing order of degree {with 
random tie-breaking) until the selected set dominates the entire network. 

To find a probabilistic dominating set (with any particular node selection prob- 
ability), we use the following algorithm. First, we initialize set Y to contain all nodes of 
the network, and initialize set X to an empty set. Then, we visit each node exactly once, 
and determine whether it should be added to set X, based on the current node 
selection probability. If so, then we add the current node to set X, remove it from set Y 
(if present), and also remove all of its neighbors present in set Y. This way, once all 
nodes have been evaluated, we obtain the probabilistic dominating set as X U Y. We 
use hashed sets for X and Y, which makes the addition, check of containment, and 
removal of nodes from the sets anO(l) time operation (in amortized time). We loop 
over all nodes exactly once, and visit all their neihbors, therefore we visit each edge 
exactly twice, making the algorithm's time complexity 0(E) and memory complexity 
0(N), where E is the number of edges and N is the number of nodes in the network. 
Note, for sparse networks with small average degree, 0(E) — 0(N). 

When we calculate a cutoff dominating set (CDS), there is an additional optim- 
ization we use to find the CDS size for all possible k degree cutoff values, including the 
optimal one that minimizes CDS size, in the same time complexity as finding CDS for 
only one k value. First, we sort nodes into degree classes in 0(N) time using counting 
sort (or bucket-sort). The linear time complexity comes from the fact that both the 
number of nodes and the range of their degree values is 0(N). Then, we loop over all 
degree classes in decreasing order of degree, and for each degree class we add all nodes 
to set X (and remove them and their neighbors from set 7 at the same time). This way, 
we can check the value of |X| + | Y| after finishing each degree class, which is exactly 
the size of CDS with k equal to the current class degree. We can either output the size 
of CDS at the current degree, or simply record which CDS size at which k was the 
smallest. Since we process each node exactly the same way as in RDS (except for the 
specific order in which they are processed), we have the same O(E) time complexity, 
and it is not increased by the 0(N) time needed to sort the nodes. 

We control assortativity by randomly mixing the network's edges, using a Markov- 
chain of double-edge swaps with biased acceptance probabilities. Without the bias, 
this method was used in 36 and 34 to sample networks with a given degree distribution. 
Here we use the same method, but we introduce an additional condition for accepting 
a randomly proposed and otherwise possible edge swap. This acceptance probability 
is parametrized by a control value a £ [ — 1,1] that introduces the bias toward 
accepting a higher or lower fraction of swaps that make the network more assortative, 
based on its value, in the following way: 

I a if a > 0 and the swap makes the network more assortative 
— a if a < 0 and the swap makes the network more dis assortative 
1 — \a\ otherwise, 

Therefore, we obtain the most disassortative network when a — —I and the most 
assortative network when a = 1, However, the relationship between a and any 
particular assortativity measure, such as Spearman's p, is non-trivial, as shown in 
Fig. 6. The detailed description of this method is included in the Supplementary 
Information, Section S.4. 
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